Ad Widget

Collapse

database is down tous les jours à la meme heures

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lepaob
    Junior Member
    • Nov 2012
    • 7

    #1

    database is down tous les jours à la meme heures

    Bonjour,

    Tous les jours à la meme heure (15H55 environ) , j'ai le message "Zabbix database is down" puis mes pingcheck sur differend serveurs passent en PROBLEME puis tres vite repassent en OK.

    j'ai regarder du coté des CRONs mais je n'ai rien trouver ....

    je ne comprend pas pourquoi il me fais ça toujours à la meme heures.

    Si vous avez une idée ??

    Merci d'avance.
  • JBo
    Senior Member
    • Jan 2011
    • 310

    #2
    Bonjour,

    Y a-t-il quelque chose dans zabbix-server.log ?

    Quelle valeur y a-t-il dans zabbix_server.conf pour le paramètre HousekeepingFrequency ?
    S la valeur est à 24, ça pourrait expliquer un ralentissement de la base de données toutes les 24 heures.

    Cordialement,
    JBo

    Comment

    • lepaob
      Junior Member
      • Nov 2012
      • 7

      #3
      dans le log j'ai :

      12739:20121125:145130.613 Executing housekeeper
      12739:20121125:145154.820 Deleted 116855 records from history and trends
      12743:20121125:145345.737 item [wfr15svad1001.wonderbox.lan:systemStateProcessorDe viceStatusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
      12726:20121125:145353.087 SNMP item [systemStateGlobalSystemStatus] on host [wfr15svad1001.wonderbox.lan] failed: first network error, wait for 15 seconds
      12728:20121125:145357.394 Zabbix agent item [system.cpu.util[,iowait,avg1]] on host [compiere.wonderbox.lan] failed: first network error, wait for 15 seconds
      12731:20121125:145403.097 enabling SNMP checks on host [WFR15SVBO1102.wonderbox.lan]: host became available
      12727:20121125:145407.163 Zabbix agent item [proc.num[,,run]] on host [srv-ocs] failed: first network error, wait for 15 seconds
      12726:20121125:145408.290 Zabbix agent item [system.cpu.util[,nice,avg1]] on host [srv-ocs] failed: another network error, wait for 15 seconds
      12744:20121125:145410.752 item [wfr15svad1001.wonderbox.lan:systemStateEventLogSta tusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
      12731:20121125:145413.386 enabling SNMP checks on host [Color Factory]: host became available
      12725:20121125:145415.578 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'wdntsql1.wonderbox.lan' (1)
      12725:20121125:145415.578 watchdog: database is down
      12729:20121125:145425.830 Zabbix agent item [vm.memory.size[cached]] on host [neo-03.wonderbox.lan] failed: first network error, wait for 15 seconds
      12726:20121125:145428.216 Zabbix agent item [system.cpu.util[,nice,avg1]] on host [compiere.wonderbox.lan] failed: another network error, wait for 15 seconds
      12731:20121125:145429.494 resuming SNMP checks on host [wfr15svad1001.wonderbox.lan]: connection restored
      12745:20121125:145430.774 item [wfr15svad1001.wonderbox.lan:systemStateMemoryDevic eStatusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
      12745:20121125:145430.775 item [wfr15svad1001.wonderbox.lan:systemStateTeperatureS tatusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
      12729:20121125:145432.658 Zabbix agent item [system.cpu.util[,nice,avg1]] on host [testcompiere.wonderbox.lan] failed: first network error, wait for 15 seconds
      12730:20121125:145432.659 Zabbix agent item [system.cpu.util[,user,avg1]] on host [db-replica.wonderbox.lan] failed: first network error, wait for 15 seconds
      12731:20121125:145432.660 resuming Zabbix agent checks on host [srv-ocs]: connection restored
      12728:20121125:145432.660 Zabbix agent item [system.cpu.util[,idle,avg1]] on host [neo-03.wonderbox.lan] failed: another network error, wait for 15 seconds
      12728:20121125:145439.767 Zabbix agent item [net.if.in[10.33.0.15,bytes]] on host [filer.wonderbox.lan] failed: first network error, wait for 15 seconds
      12730:20121125:145439.768 Zabbix agent item [system.cpu.load[,avg5]] on host [wbx-kasp.wonderbox.lan] failed: first network error, wait for 15 seconds
      12729:20121125:145439.768 Zabbix agent item [system.cpu.load[,avg5]] on host [WFR15SVTOIP1003.wonderbox.lan] failed: first network error, wait for 15 seconds
      12727:20121125:145450.457 temporarily disabling Zabbix agent checks on host [compiere.wonderbox.lan]: host unavailable
      No log handling enabled - using stderr logging
      getaddrinfo: wfr15svad1001.wonderbox.lan Name or service not known
      12745:20121125:145505.834 item [wfr15svad1001.wonderbox.lan:systemStateChassisIntr usionStatusCombined] became not supported: Error doing snmp_open()
      12729:20121125:145505.888 Zabbix agent item [net.if.out[10.33.0.76,bytes]] on host [wfr15svsql1006.wonderbox.lan] failed: first network error, wait for 15 seconds
      12731:20121125:145506.469 Zabbix agent item [proc.num[]] on host [neo-03.wonderbox.lan] failed: another network error, wait for 15 seconds
      12731:20121125:145508.024 resuming Zabbix agent checks on host [testcompiere.wonderbox.lan]: connection restored
      12731:20121125:145508.027 resuming Zabbix agent checks on host [db-replica.wonderbox.lan]: connection restored
      12731:20121125:145508.049 resuming Zabbix agent checks on host [filer.wonderbox.lan]: connection restored
      12731:20121125:145508.052 resuming Zabbix agent checks on host [wbx-kasp.wonderbox.lan]: connection restored
      12731:20121125:145508.055 resuming Zabbix agent checks on host [WFR15SVTOIP1003.wonderbox.lan]: connection restored
      12745:20121125:145515.845 item [supervision.wonderbox.lan:web.page.regexp[wfr15svpr1002.wonderbox.lan,/tib/monitor.asp?id=405,80,"^[0-9]*$",2]] became not supported: Type of received value [EOF] is not suitable for value type [Numeric (unsigned)]
      12727:20121125:145521.614 Zabbix agent item [vm.memory.size[free]] on host [WFR15SVBO1102.wonderbox.lan] failed: first network error, wait for 15 seconds
      12731:20121125:145529.251 resuming Zabbix agent checks on host [wfr15svsql1006.wonderbox.lan]: connection restored
      12728:20121125:145529.252 Zabbix agent item [proc.num[]] on host [neo-02.wonderbox.lan] failed: another network error, wait for 15 seconds
      12729:20121125:145529.252 Zabbix agent item [net.if.in[eth1,bytes]] on host [neo-02.wonderbox.lan] failed: first network error, wait for 15 seconds
      12730:20121125:145529.253 Zabbix agent item [vfs.file.cksum[/usr/bin/ssh]] on host [srv-ocs] failed: first network error, wait for 15 seconds
      12731:20121125:145529.253 resuming Zabbix agent checks on host [neo-03.wonderbox.lan]: connection restored
      12726:20121125:145529.260 Zabbix agent item [kernel.maxfiles] on host [testcompiere.wonderbox.lan] failed: first network error, wait for 15 seconds
      12728:20121125:145543.031 Zabbix agent item [vm.memory.size[free]] on host [wfr15svad1001.wonderbox.lan] failed: first network error, wait for 15 seconds
      12729:20121125:145543.056 Zabbix agent item [system.swap.size[,free]] on host [Color Factory] failed: first network error, wait for 15 seconds
      12730:20121125:145544.916 Zabbix agent item [system.cpu.load[,avg5]] on host [compensation.wonderbox.lan] failed: first network error, wait for 15 seconds
      12731:20121125:145546.315 Zabbix agent item [vm.memory.size[free]] on host [WFR15SVBO1102.wonderbox.lan] failed: another network error, wait for 15 seconds
      12727:20121125:145551.713 Zabbix agent item [system.cpu.load[,avg1]] on host [compensation.wonderbox.lan] failed: another network error, wait for 15 seconds
      12730:20121125:145555.053 Zabbix agent item [service_state[Dhcp]] on host [Color Factory] failed: another network error, wait for 15 seconds
      12728:20121125:145555.923 Zabbix agent item [system.cpu.load[,avg5]] on host [WFR15SVTOIP1004.wonderbox.lan] failed: first network error, wait for 15 seconds
      12731:20121125:145555.923 Zabbix agent item [system.cpu.load[,avg1]] on host [testcompiere.wonderbox.lan] failed: another network error, wait for 15 seconds
      12731:20121125:145555.929 resuming Zabbix agent checks on host [srv-ocs]: connection restored
      12731:20121125:145555.932 resuming Zabbix agent checks on host [neo-02.wonderbox.lan]: connection restored
      12731:20121125:145555.946 enabling Zabbix agent checks on host [compiere.wonderbox.lan]: host became available
      12744:20121125:145556.005 item [wfr15svad1001.wonderbox.lan:systemStatePowerSupply StatusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
      12730:20121125:145601.442 Zabbix agent item [perf_counter["\SQLServer:Buffer Manager\Database pages"]] on host [wfr15svsql1006.wonderbox.lan] failed: first network error, wait for 15 seconds
      12728:20121125:145601.444 Zabbix agent item [proc.num[]] on host [wfr15svsql1006.wonderbox.lan] failed: another network error, wait for 15 seconds
      12731:20121125:145604.971 resuming Zabbix agent checks on host [wfr15svad1001.wonderbox.lan]: connection restored
      12731:20121125:145604.977 resuming Zabbix agent checks on host [WFR15SVBO1102.wonderbox.lan]: connection restored
      12731:20121125:145610.123 resuming Zabbix agent checks on host [compensation.wonderbox.lan]: connection restored
      12731:20121125:145610.129 resuming Zabbix agent checks on host [testcompiere.wonderbox.lan]: connection restored
      12731:20121125:145610.140 resuming Zabbix agent checks on host [Color Factory]: connection restored
      12731:20121125:145610.146 resuming Zabbix agent checks on host [WFR15SVTOIP1004.wonderbox.lan]: connection restored
      12731:20121125:145616.151 resuming Zabbix agent checks on host [wfr15svsql1006.wonderbox.lan]: connection restored



      Sinon dans zabbix_server.conf :

      ### Option: HousekeepingFrequency
      # How often Zabbix will perform housekeeping procedure (in hours).
      # Housekeeping is removing unnecessary information from history, alert, and alarms tables.
      #
      # Mandatory: no
      # Range: 1-24
      # Default:
      # HousekeepingFrequency=1

      ### Option: MaxHousekeeperDelete
      # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
      # [housekeeperid], [tablename], [field], [value].
      # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
      # will be deleted per one task in one housekeeping cycle.
      # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
      # If set to 0 then no limit is used at all. In this case you must know what you are doing!
      #
      # Mandatory: no
      # Range: 0-1048576
      # Default:
      # MaxHousekeeperDelete=500

      ### Option: DisableHousekeeping
      # If set to 1, disables housekeeping.
      #
      # Mandatory: no
      # Range: 0-1
      # Default:
      # DisableHousekeeping=0


      rien n'est activé je pense pour Housekeeping mais dans les log j'ai bien :

      12739:20121125:145130.613 Executing housekeeper

      4 min avant mon probleme de Database is down.

      Merci de ton aide dans tous les cas

      ++

      Comment

      • JBo
        Senior Member
        • Jan 2011
        • 310

        #4
        Originally posted by lepaob

        rien n'est activé je pense pour Housekeeping mais dans les log j'ai bien :

        12739:20121125:145130.613 Executing housekeeper

        4 min avant mon probleme de Database is down.

        Merci de ton aide dans tous les cas

        ++
        D'après la doc et les commentaires de zabbix_server.log:
        • si HousekeepingFrequency n'est pas défini, il a une valeur par défaut de 1, il tourne donc toutes les heures.
        • si DisableHousekeeping n'est pas défini, il a une valeur par défaut de 0, le Housekeeping est donc actif.

        Les deux lignes:
        Code:
        12725:20121125:145415.578 [Z3001] connection to database 'zabbix'  failed: [2005] Unknown MySQL server host 'wdntsql1.wonderbox.lan' (1)
        12725:20121125:145415.578 watchdog: database is down
        Indiquent que Zabbix ne peut pas se connecter à wdntsql1.wonderbox.lan. avec l'erreur 'Unknown MySQL server host'.
        Je suppose que Zabbix n'est pas installé sur le même serveur que Mysql. N'y aurait-il pas des problèmes réseau, en particulier DNS ?
        Est-ce que le problème persiste si le nom wdntsql1.wonderbox.lan est remplacé par son adresse IP dans zabbix_server.conf ?

        Cordialement,
        JBo

        Comment

        • lepaob
          Junior Member
          • Nov 2012
          • 7

          #5
          log du serveur :

          12725:20121125:145415.578 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'wdntsql1.wonderbox.lan' (1)
          12725:20121125:145415.578 watchdog: database is down
          12729:20121125:145425.830 Zabbix agent item [vm.memory.size[cached]] on host [neo-03.wonderbox.lan] failed: first network error, wait for 15 seconds
          12726:20121125:145428.216 Zabbix agent item [system.cpu.util[,nice,avg1]] on host [compiere.wonderbox.lan] failed: another network error, wait for 15 seconds
          12731:20121125:145429.494 resuming SNMP checks on host [wfr15svad1001.wonderbox.lan]: connection restored
          12745:20121125:145430.774 item [wfr15svad1001.wonderbox.lan:systemStateMemoryDevic eStatusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
          12745:20121125:145430.775 item [wfr15svad1001.wonderbox.lan:systemStateTeperatureS tatusCombined] became not supported: wfr15svad1001.wonderbox.lan: [-2] Name or service not known
          12729:20121125:145432.658 Zabbix agent item [system.cpu.util[,nice,avg1]] on host [testcompiere.wonderbox.lan] failed: first network error, wait for 15 seconds
          12730:20121125:145432.659 Zabbix agent item [system.cpu.util[,user,avg1]] on host [db-replica.wonderbox.lan] failed: first network error, wait for 15 seconds
          12731:20121125:145432.660 resuming Zabbix agent checks on host [srv-ocs]: connection restored
          12728:20121125:145432.660 Zabbix agent item [system.cpu.util[,idle,avg1]] on host [neo-03.wonderbox.lan] failed: another network error, wait for 15 seconds
          12728:20121125:145439.767 Zabbix agent item [net.if.in[10.33.0.15,bytes]] on host [filer.wonderbox.lan] failed: first network error, wait for 15 seconds
          12730:20121125:145439.768 Zabbix agent item [system.cpu.load[,avg5]] on host [wbx-kasp.wonderbox.lan] failed: first network error, wait for 15 seconds
          12729:20121125:145439.768 Zabbix agent item [system.cpu.load[,avg5]] on host [WFR15SVTOIP1003.wonderbox.lan] failed: first network error, wait for 15 seconds
          12727:20121125:145450.457 temporarily disabling Zabbix agent checks on host [compiere.wonderbox.lan]: host unavailable
          No log handling enabled - using stderr logging
          getaddrinfo: wfr15svad1001.wonderbox.lan Name or service not known
          12745:20121125:145505.834 item [wfr15svad1001.wonderbox.lan:systemStateChassisIntr usionStatusCombined] became not supported: Error doing snmp_open()
          12729:20121125:145505.888 Zabbix agent item [net.if.out[10.33.0.76,bytes]] on host [wfr15svsql1006.wonderbox.lan] failed: first network error, wait for 15 seconds
          12731:20121125:145506.469 Zabbix agent item [proc.num[]] on host [neo-03.wonderbox.lan] failed: another network error, wait for 15 seconds
          12731:20121125:145508.024 resuming Zabbix agent checks on host [testcompiere.wonderbox.lan]: connection restored
          12731:20121125:145508.027 resuming Zabbix agent checks on host [db-replica.wonderbox.lan]: connection restored
          12731:20121125:145508.049 resuming Zabbix agent checks on host [filer.wonderbox.lan]: connection restored
          12731:20121125:145508.052 resuming Zabbix agent checks on host [wbx-kasp.wonderbox.lan]: connection restored
          12731:20121125:145508.055 resuming Zabbix agent checks on host [WFR15SVTOIP1003.wonderbox.lan]: connection restored
          12745:20121125:145515.845 item [supervision.wonderbox.lan:web.page.regexp[wfr15svpr1002.wonderbox.lan,/tib/monitor.asp?id=405,80,"^[0-9]*$",2]] became not supported: Type of received value [EOF] is not suitable for value type [Numeric (unsigned)]
          12727:20121125:145521.614 Zabbix agent item [vm.memory.size[free]] on host [WFR15SVBO1102.wonderbox.lan] failed: first network error, wait for 15 seconds
          12731:20121125:145529.251 resuming Zabbix agent checks on host [wfr15svsql1006.wonderbox.lan]: connection restored
          12728:20121125:145529.252 Zabbix agent item [proc.num[]] on host [neo-02.wonderbox.lan] failed: another network error, wait for 15 seconds
          12729:20121125:145529.252 Zabbix agent item [net.if.in[eth1,bytes]] on host [neo-02.wonderbox.lan] failed: first network error, wait for 15 seconds
          12730:20121125:145529.253 Zabbix agent item [vfs.file.cksum[/usr/bin/ssh]] on host [srv-ocs] failed: first network error, wait for 15 seconds
          12731:20121125:145529.253 resuming Zabbix agent checks on host [neo-03.wonderbox.lan]: connection restored
          12726:20121125:145529.260 Zabbix agent item [kernel.maxfiles] on host [testcompiere.wonderbox.lan] failed: first network error, wait for 15 seconds
          12728:20121125:145543.031 Zabbix agent item [vm.memory.size[free]] on host [wfr15svad1001.wonderbox.lan] failed: first network error, wait for 15 seconds
          12729:20121125:145543.056 Zabbix agent item [system.swap.size[,free]] on host [Color Factory] failed: first network error, wait for 15 seconds
          12730:20121125:145544.916 Zabbix agent item [system.cpu.load[,avg5]] on host [compensation.wonderbox.lan] failed: first network error, wait for 15 seconds
          12731:20121125:145546.315 Zabbix agent item [vm.memory.size[free]] on host [WFR15SVBO1102.wonderbox.lan] failed: another network error, wait for 15 seconds
          12727:20121125:145551.713 Zabbix agent item [system.cpu.load[,avg1]] on host [compensation.wonderbox.lan] failed: another network error, wait for 15 seconds
          12730:20121125:145555.053 Zabbix agent item [service_state[Dhcp]] on host [Color Factory] failed: another network error, wait for 15 seconds
          12728:20121125:145555.923 Zabbix agent item [system.cpu.load[,avg5]] on host [WFR15SVTOIP1004.wonderbox.lan] failed: first network error, wait for 15 seconds


          et conf :

          ### Option: HousekeepingFrequency
          # How often Zabbix will perform housekeeping procedure (in hours).
          # Housekeeping is removing unnecessary information from history, alert, and alarms tables.
          #
          # Mandatory: no
          # Range: 1-24
          # Default:
          # HousekeepingFrequency=1

          ### Option: MaxHousekeeperDelete
          # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
          # [housekeeperid], [tablename], [field], [value].
          # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
          # will be deleted per one task in one housekeeping cycle.
          # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
          # If set to 0 then no limit is used at all. In this case you must know what you are doing!
          #
          # Mandatory: no
          # Range: 0-1048576
          # Default:
          # MaxHousekeeperDelete=500

          ### Option: DisableHousekeeping
          # If set to 1, disables housekeeping.
          #
          # Mandatory: no
          # Range: 0-1
          # Default:
          # DisableHousekeeping=0


          je pense que Housekeeper est pas activé.

          ++

          Comment

          • lepaob
            Junior Member
            • Nov 2012
            • 7

            #6
            effectivement j'ai trouver l'info pour housekeeper aussi , mais je ne pense pas que ce soit la cause , vu que dans les log il tourne souvent , et je n'ai l'erreur qu'une fois par jour, je vais tester de remplacer le nom d'hote par l'adresse IP ,

            Merci de ton aide , je te tiens au courant .

            Merci

            ++

            Comment

            • lepaob
              Junior Member
              • Nov 2012
              • 7

              #7
              du serveur :


              conf :

              ### Option: HousekeepingFrequency
              # How often Zabbix will perform housekeeping procedure (in hours).
              # Housekeeping is removing unnecessary information from history, alert, and alarms tables.
              #
              # Mandatory: no
              # Range: 1-24
              # Default:
              # HousekeepingFrequency=1

              ### Option: MaxHousekeeperDelete
              # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
              # [housekeeperid], [tablename], [field], [value].
              # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
              # will be deleted per one task in one housekeeping cycle.
              # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
              # If set to 0 then no limit is used at all. In this case you must know what you are doing!
              #
              # Mandatory: no
              # Range: 0-1048576
              # Default:
              # MaxHousekeeperDelete=500

              ### Option: DisableHousekeeping
              # If set to 1, disables housekeeping.
              #
              # Mandatory: no
              # Range: 0-1
              # Default:
              # DisableHousekeeping=0


              je pense que Housekeeper est pas activé.

              Comment

              • lepaob
                Junior Member
                • Nov 2012
                • 7

                #8
                conf :

                ### Option: HousekeepingFrequency
                # How often Zabbix will perform housekeeping procedure (in hours).
                # Housekeeping is removing unnecessary information from history, alert, and alarms tables.
                #
                # Mandatory: no
                # Range: 1-24
                # Default:
                # HousekeepingFrequency=1

                ### Option: MaxHousekeeperDelete
                # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
                # [housekeeperid], [tablename], [field], [value].
                # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
                # will be deleted per one task in one housekeeping cycle.
                # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
                # If set to 0 then no limit is used at all. In this case you must know what you are doing!
                #
                # Mandatory: no
                # Range: 0-1048576
                # Default:
                # MaxHousekeeperDelete=500

                ### Option: DisableHousekeeping
                # If set to 1, disables housekeeping.
                #
                # Mandatory: no
                # Range: 0-1
                # Default:
                # DisableHousekeeping=0


                je pense que Housekeeper est pas activé.

                Comment

                Working...